max rank | avg. rank | sentence |
---|---|---|
208 | 75.3000 | It was really important to her to have a life. |
216 | 66.8182 | What is up is down and what is down is up. |
221 | 108.9091 | And people do, they need and want their data back.” |
240 | 87.0000 | So it is important that this information is made available. |
246 | 103.1250 | What are the first things they should see? |
252 | 94.7857 | What are we as a government and a community going to do now? |
269 | 142.0000 | So where does it start and where does it end? |
274 | 78.2727 | We will always be here for you if you need us. |
281 | 77.2222 | “I think it needs to be a different time. |
285 | 130.8750 | A school is not a school without students. |
289 | 151.8333 | New products and new business development. |
298 | 72.3333 | I can be sure that they would not be the same. |
298 | 83.6364 | If so, not sure what we could do about it now? |
300 | 112.3636 | At the same time, what they want is you.” |
308 | 77.2222 | But there is more to this site than that. |
319 | 123.8182 | However there is still much work that needs to be done. |
321 | 149.2500 | But again, early days for the research there. |
326 | 121.3333 | So that is what I did for the next 4 years. |
333 | 92.4500 | And they don't say how they are going to get them or what they will do, if they get them. |
333 | 111.2000 | It is great that you don't have any health issues. |
341 | 151.6667 | How long should it take before I see results? |
348 | 110.7273 | I am in Melbourne but will see what I find. |
353 | 151.6364 | And it's good because there are a few things I… |
359 | 95.7000 | What do you need, and why do you need it? |
367 | 158.1250 | For him without women, there was no life. |
375 | 115.4444 | He was in this role for about four years. |
383 | 116.2000 | And it’s not the best - but it is good. |
388 | 166.7500 | And these things are not even big things. |
401 | 128.7857 | For the cost of the business, we don't really have that much of cost. |
424 | 151.5455 | So, this was the problem; this is still the problem today. |
The maximum word rank of a sentence is by definition the rank of the rarest word in the sentence. If it is low, all words in the sentence are of high frequency. For this reason the table of the sentences with least maximum word number might be of interest. In the table, we see the corresponding sentences with a minimum length of 40 characters.
The over all distribution of the maximum rank in all sentences of the corpus is shown in a diagram with log-scaled x-axis.
The sentences in the table described above are of interest because they are usually easy to understand. The distribution may give insights into the corpus and may give parameters for language comparison.
While the distribution might be deduced from a small corpus, the sentences in the table are rare and a large corpus will give more impressive results.
Table data:
select max(w_id)-100 as m, avg(w_id)-100 as a, s.sentence from sentences s, inv_w i where s.s_id=i.s_id and length(sentence)>40 and i.w_id>100 group by s.s_id order by m limit 30;
Distribution data;
select m, count(*) from (select 100* round((max(w_id)-100)/100) as m from sentences s, inv_w i where s.s_id=i.s_id and i.w_id>100 group by s.s_id) aa group by m;
Explain the distribution, especially the increase in its right part.
4.5.2.2 Average word rank in sentence
4.5.2.3 Sentences consisting of many low frequency words I
4.5.2.4 Sentences consisting of many low frequency words II
4.5.2.5 Sentences consisting of short words only I
4.5.2.6 Sentences consisting of short words only II
4.5.2.7 Sentences consisting of long words only I
4.5.2.8 Sentences consisting of long words only II